Visual experience map for media presentations

ABSTRACT

Systems and methods for video editing and playback are provided. In one implementation, a selected portion of a timeline for navigating media content can be repositioned and resized by user input actions received along various axes relative to the timeline. In another implementation, a plurality of signals associated with media content can be intelligently weighted based on user group historical attributes to identify portions of interest in the media content. In a further implementation, an experience map for media content is provided in which a representative signature for the content includes visual signal intensity representations and social interest concentrations over the length of the content. In another implementation, a subset of filters is determined for recommendation to a user based on one or more attributes associated with at least one of media content, the user, a group of users, or a user device.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and the benefit of U.S. ProvisionalPatent Application No. 62/014,203, filed on Jun. 19, 2014; U.S.Provisional Patent Application No. 62/047,553, filed on Sep. 8, 2014;and U.S. Provisional Patent Application No. 62/131,455, filed on Mar.11, 2015, the entireties of which are incorporated by reference herein.

BACKGROUND

The present disclosure relates generally to media curation, editing andplayback and, more particularly, to systems and methods for identifyingportions of interest in audio and video, manipulating media segmentsusing a simplified interface, and forming a representative signature foraudio and video based on content signal intensity and social interest.

Creators of media content often generate substantially more content thanis needed or used in a final audio and/or video production. Contentcreators may only be interested in showcasing the most interesting orrelevant portions of their generated content to an audience. Forexample, a snowboarder may desire to exhibit his best tricks on video,while discarding intermediate portions of the video that show himboarding down a mountainside between jumps. While the snowboarder canupload his video and utilize complex video editing software to compile ahighlights video once he has returned from the slopes, identifyinginteresting video segments, editing captured video, and sharing modifiedcontent in the midst of his excursion is exceedingly difficult. There isa need for systems and methods that facilitate the foregoing tasks forcontent creators.

BRIEF SUMMARY

Systems and methods for video editing and playback are disclosed herein.In one aspect, a computer-implemented method comprises: providing avisual representation of a timeline of media content wherein thetimeline comprises a plurality of different time positions in the mediacontent; indicating a selected portion of the timeline in the visualrepresentation wherein the selected portion is a continuous region ofthe timeline bounded by a first border and a second border wherein eachborder corresponds to a different respective time position on thetimeline; receiving a first user input action along a first axis of thetimeline; changing a position of the selected portion in the visualrepresentation along the timeline and based on the first user inputaction; receiving a second user input action along a second axis of thetimeline; and resizing the selected portion in the visual representationbased on the second user input action wherein resizing the selectedportion comprises moving both of the respective time positions of theborders to be closer to each other or farther from each other. The mediacontent can comprise at least one of video and audio. Other embodimentsof this aspect include corresponding systems and computer programs.

In one implementation, the first axis is parallel to the timeline. Thefirst axis can also be coaxial with the timeline, and the second axiscan be perpendicular to the first axis. A particular user input actioncan include a touchscreen gesture, a mouse gesture, a tap, a click, aclick-and-drag, a tracked free-hand gesture, a tracked eye movement, abutton press, or an applied pressure.

In another implementation, the selected portion moves along the timelinesimultaneously with receiving the first user input action. The firstuser input action can comprise a motion between a first point on thetimeline and a second point on the timeline, and the selected portioncan move from the first point to the second point in directcorrespondence with the first user input action.

In one implementation, the second user input action comprises a motionalong a first direction of the second axis, and the borders can be movedcloser to each other simultaneously with receiving the second user inputaction. The second user input action can comprise a motion along asecond direction, opposite the first direction, of the second axis, andthe borders can be moved farther from each other simultaneously withreceiving the second user input action.

In a further implementation, the method comprises receiving a third userinput action along a third axis of the timeline; and splitting the mediacontent into a plurality of selected portions based on a position of theselected portion on the timeline when the third user input action isreceived. The third axis can be perpendicular to the first axis and thesecond axis.

In yet another implementation, the timeline comprises visual indicatorsidentifying of portions of interest of the media content.

In another aspect, a computer-implemented method comprises: providing avisual representation of a timeline of media content wherein thetimeline comprises a plurality of different time positions in the mediacontent; indicating a selected portion of the timeline in the visualrepresentation wherein the selected portion is a continuous region ofthe timeline bounded by a first border and a second border wherein eachborder corresponds to a different respective time position on thetimeline; receiving a user input action along an axis perpendicular tothe timeline; and resizing the selected portion in the visualrepresentation based on the user input action wherein resizing theselected portion comprises moving both of the respective time positionsof the borders to be closer to each other or farther from each other.Other embodiments of this aspect include corresponding systems andcomputer programs.

In one implementation, a particular user input action is a touchscreengesture, a mouse gesture, a tap, a click, a click-and-drag, a trackedfree-hand gesture, a tracked eye movement, a button press, or an appliedpressure. The user input action can comprise a motion along a firstdirection of the axis, and the borders can be moved closer to each othersimultaneously with receiving the user input action. The user inputaction can comprise a motion along a second direction, opposite thefirst direction, of the axis, and the borders can be moved farther fromeach other simultaneously with receiving the user input action.

In another implementation, the method further comprises generatingsecond media content based on the selected portion, the second mediacontent comprising at least a portion of the media content.

In another aspect, a computer-implemented method comprises: receiving avideo comprising a plurality of signals, at least one signalrepresenting an identifiable type of content over a length of the video;for at least one of the signals: identifying at least one intermediateportion of interest in the video based on the signal, and associating aweighting with the signal, wherein the weighting is determined based atleast in part on historical attributes associated with at least one ofan individual and a group of users; and identifying one or more overallportions of interest of the video based on the at least one intermediateportion of interest and the at least one signal weighting. Otherembodiments of this aspect include corresponding systems and computerprograms.

In one implementation, the identifiable type of content for a particularsignal is selected from the group consisting of motion, sound, presenceof faces, recognized faces, recognized objects, recognized activities,and recognized scenes. At least one of the signals can comprise sensorreadings over a length of the video. The sensor can comprise anaccelerometer, a gyroscope, a heart rate sensor, a compass, a lightsensor, a GPS, or a motion sensor.

In another implementation, a particular intermediate portion of interestin the video is identified based on an intensity of the signal.Identifying a particular overall portion of interest of the video cancomprise: combining the signals according to the respective weighting ofeach signals; identifying a portion of the combined signals that meets athreshold signal intensity; and identifying as the particular overallportion of interest a portion of the media content that corresponds tothe identified portion of combined signals. Identifying a particularoverall portion of interest of the video can also comprise: combiningthe signals according to the respective weighting of each signals;identifying a portion of the combined signals that comprises a high orlow signal intensity relative to other portions of the combined signals;and identifying as the particular overall portion of interest a portionof the media content that corresponds to the identified portion ofcombined signals.

In a further implementation, associating a weighting with a particularsignal comprises: training a classifier to predict whether a givensignal weighting would result in identifying a portion of interest inmedia content using the historical attributes associated with the groupof users; and providing attributes associated with the particular signalas input to the classifier and obtaining the weighting for theparticular signal as output of the classifier.

In yet another implementation, the individual is an editor of the video.A particular historical attribute associated with an editor of the videocan comprise: a propensity of the editor to favor video content having aparticular signal, a propensity of the editor to favor video contentlacking a particular signal, a propensity of the editor to favor videocontent having a particular signal with a particular signal intensity, apropensity of the editor to disfavor video content having a particularsignal, a propensity of the editor to disfavor video content lacking aparticular signal, or a propensity of the editor to disfavor videocontent having a particular signal with a particular signal intensity.

A particular historical attribute associated with the group of users cancomprise: a propensity of the group of users to favor video contenthaving a particular signal, a propensity of the group of users to favorvideo content lacking a particular signal, a propensity of the group ofusers to favor video content having a particular signal with aparticular signal intensity, a propensity of the group of users todisfavor video content having a particular signal, a propensity of thegroup of users to disfavor video content lacking a particular signal, ora propensity of the group of users to disfavor video content having aparticular signal with a particular signal intensity.

In one implementation, the method further comprises: for at least one ofthe signals, associating a second weighting with the signal, wherein thesecond weighting is determined based at least in part on historicalattributes associated with one or more of an editor of the video, a userother than the editor, and a group of users; and identifying one or moresecond overall portions of interest of the video based on the at leastone intermediate portion of interest and the at least one second signalweighting.

In another implementation, the method further comprises: providing avisual representation of a timeline of the video wherein the timelinecomprises a plurality of different time positions in the video; andindicating the identified overall portions of interest in the visualrepresentation of the timeline.

In another aspect, a computer-implemented method comprises: providing avisual representation of a timeline of a video wherein the timelinecomprises a plurality of different time positions in the video;providing a visual representation of one or more signals along thetimeline of the video, wherein each signal representation comprises arespective intensity of the signal over the time positions; receiving,from each of a plurality of users, an indication of interest in aportion of the video; and providing a visual representation on thetimeline of each indication of interest. Other embodiments of thisaspect include corresponding systems and computer programs.

In one implementation, the visual representation of the signalscomprises a visual representation of a weighted sum of signals. Aparticular indication of interest can comprise a comment, a like, ashare, or a highlight.

In another implementation, the method further comprises determining asocial signal based on the indications of interest, wherein an intensityof the social signal over a length of the video is based on aconcentration of the indications of interest over the length of thevideo. The method can further comprise: receiving a second videocomprising a plurality of signals, each signal representing anidentifiable type of content over a length of the video; for at leastone of the signals, associating a weighting with the signal, wherein theweighting is determined based at least in part on the social signal; andidentifying one or more portions of interest in the second video basedon the at least one signal weighting. The weighting can be determinedbased at least in part on the intensity of the social signal. Theweighting can also be determined based at least in part on indicationsof interest from a plurality of videos.

In a further implementation, a particular signal represents one ofmotion, sound, presence of faces, recognized faces, recognized objects,recognized activities, recognized scenes, sensor readings, context, oruser-specified.

In another aspect, a computer-implemented method comprises: receiving ona user device media content comprising a digital video or a digitalphotograph; providing a plurality of filters that can be applied to atleast a portion of the media content; determining a subset of theplurality of filters to recommend to a user of the device based on oneor more attributes associated with at least one of the media content,the user, a group of users, or the user device; visually identifying thesubset of recommended filters; receiving from the user a selection ofone or more of the plurality of filters; and applying the selected oneor more filters to the digital content.

A particular attribute associated with the media content can includegeolocation, a point of interest, motion, sound, a recognized face, arecognized object, a recognized activity, or a recognized scene. Aparticular attribute associated with the user can include a historicalfilter preference of the user or a recent filter preference of the user.A particular attribute associated with a group of users can include ahistorical filter preference of the group of users or a recent filterpreference of the group of users. A particular attribute associated withthe user device can include a property of an image sensor of the userdevice, a device model, or an image capture setting.

In one implementation, determining the subset of the plurality offilters to recommend to a user of the device comprises: training aclassifier to predict whether a given combination of attributes wouldresult in a particular filter being selected by a user based onhistorical filter selections and corresponding historical attributesassociated with at least one of media content, a user, a group of users,or a user device; and providing one or more attributes associated withat least one the media content, the user, a group of users, or the userdevice as input to the classifier and obtaining the subset ofrecommended filters as output of the classifier.

In one aspect, a computer-implemented method comprises: providing avisual representation of a timeline of a video wherein the timelinecomprises a plurality of different time positions in the video;indicating a selected portion of the timeline in the visualrepresentation wherein the selected portion is a continuous region ofthe timeline bounded by a first border and a second border wherein eachborder corresponds to a different respective time position on thetimeline; receiving a first user input action along a first axis of thetimeline; changing a position of the first border along the timeline andbased on the first user input action; receiving a second user inputaction; and simultaneously displaying a first portion of the video and asecond portion of the video. Other embodiments of this aspect includecorresponding systems and computer programs.

In one implementation, the method further comprises receiving a thirduser input action along the first axis of the timeline; and changing aposition of the second border along the timeline and based on the thirduser input action. The first axis can be parallel to the timeline, aswell as coaxial with the timeline.

In another implementation, the second user input action comprises amotion along a second axis of the timeline. The second axis can beperpendicular to the first axis.

In a further implementation, the displayed first portion of the videocomprises an image corresponding to a portion of the video at abeginning of the selected portion with respect to the timeline. Inanother implementation, the displayed second portion of the videocomprises an image corresponding to a portion of the video at an end ofthe selected portion with respect to the timeline.

In yet another implementation, a particular user input action is atouchscreen gesture, a mouse gesture, a tap, a click, a click-and-drag,a tracked free-hand gesture, a tracked eye movement, a button press, oran applied pressure.

In one implementation, the first border moves along the timelinesimultaneously with receiving the first user input action. Likewise, thesecond border can move along the timeline simultaneously with receivingthe third user input action. The first user input action can include amotion between a first point on the timeline and a second point on thetimeline, and the first border can move from the first point to thesecond point in direct correspondence with the first user input action.Similarly, the third user input action can include a motion between afirst point on the timeline and a second point on the timeline, and thesecond border can move from the first point to the second point indirect correspondence with the third user input action.

In a further implementation, the method includes updating the displayedfirst portion of the video to correspond to a change in position of thefirst border along the timeline. The method can further include updatingthe displayed second portion of the video to correspond to a change inposition of the second border along the timeline

In another implementation, the video is a first video, and the methodfurther includes generating a second video based on the selectedportion, the second video comprising at least a portion of the firstvideo

The details of one or more implementations of the subject matterdescribed in the present specification are set forth in the accompanyingdrawings and the description below. Other features, aspects, andadvantages of the subject matter will become apparent from thedescription, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, like reference characters generally refer to the sameparts throughout the different views. Also, the drawings are notnecessarily to scale, emphasis instead generally being placed uponillustrating the principles of the implementations. In the followingdescription, various implementations are described with reference to thefollowing drawings, in which:

FIG. 1 depicts an example system architecture for a video editing andplayback system according to an implementation.

FIG. 2 depicts an example user interface for a video editing systemaccording to an implementation.

FIGS. 3A and 3B depict example user input motions for positioning aselected portion on a timeline.

FIGS. 4A and 4B depict example user input motions for resizing aselected portion on a timeline.

FIG. 5 depicts an example user input motion for splitting media contenton a timeline.

FIG. 6 depicts a flowchart of an example method for manipulating aselected portion of a timeline.

FIG. 7 depicts a flowchart of an example method for weighting aplurality of signals.

FIG. 8 depicts an example user interface for a video playback systemaccording to an implementation.

FIG. 9 depicts a flowchart of an example method for providing anexperience map for media content.

FIG. 10 depicts a flowchart of an example method for recommending afilter for application to media content.

FIGS. 11 and 12 depict example graphical user interfaces for selecting afilter to apply to media content.

FIGS. 13A-13C depict example user input motions for defining a selectedportion on a timeline.

DETAILED DESCRIPTION

Described herein in various implementations are systems and methods forediting, manipulating, and viewing media content. Media content caninclude digital media encoded in a machine-readable format, includingbut not limited to audio (e.g., sound recordings of events, activities,performances, speech, music, etc.), video (visual recordings of events,activities, performances, animation, etc.), and other forms of mediacontent usable in conjunction with the techniques described herein.Media content can also include streaming media (recorded or live).

FIG. 1 depicts an example high-level system architecture in which anapplication 115 on a user device 110 communicates with one or moreremote servers 120 over communications network 150. The user device 110can be, for example, a smart phone, tablet computer, smart watch, smartglasses, portable computer, mobile telephone, laptop, palmtop, gamingdevice, music device, television, smart or dumb terminal, networkcomputer, personal digital assistant, wireless device, informationappliance, workstation, minicomputer, mainframe computer, or othercomputing device, that is operated as a general purpose computer or as aspecial purpose hardware device that can execute the functionalitydescribed herein.

The application 115 on the user device 110 can provide media playbackand editing functionality to a device user. In one implementation, theapplication 115 provides a user interface that allows a user to browsethrough, manipulate, edit, and/or play media content (e.g., a videofile, an audio file, etc.) using a visual representation of a timeline.In another implementation, the application 115 analyzes media content toidentify one or more portions of interest, which analysis can be basedon a weighting of various signals associated with the content. As usedherein, a “signal” refers to time-varying data describing anidentifiable type of content in audio, video, or other media content ora portion thereof, including, but not limited to, motion data (e.g.,displacement, direction, velocity, acceleration, orientation, angularmomentum, and time), sound, geographic location, presence of faces,recognized faces, recognized objects, recognized activities, andrecognized scenes. A signal can also refer to a time-varying or staticattribute associated with media content or a portion thereof, including,but not limited to, popularity (e.g., measurement of likes,recommendations, sharing), context, sensor readings on a device (e.g.,readings from an accelerometer, gyroscope, heart rate sensor, compass,light sensor, motion sensor, and the like), user label (e.g., a comment,hashtag, or other label that can provide hints as to the content of amedia file), location, date, time, weather, and user-specified (e.g.,manually-defined as interesting). Signal weighting data can be storedlocally on the user device 110 and/or can be transferred to and receivedfrom remote server 120.

Remote server(s) 120 can aggregate signal weighting data, socialexperience information, and other media analytics received from userdevice 110 and other user devices 180 and share the data among thedevices over communications network 150. In some implementations, remoteserver(s) 120 host and/or proxy media, webpages, and/or other contentare accessible by the user device 110 via application 115. Remoteserver(s) 120 can also perform portions of the various processesdescribed herein; for example, analysis of media content to identifysignals can be performed in whole or in part remotely, rather thanlocally on the user device 110.

Third-party services 170 can include social networking, media sharing,content distribution, and/or other platforms through which a user cansend, receive, share, annotate, edit, track, or take other actions withrespect to media content using, e.g., application 115 via communicationsnetwork 150. Third-party services 170 can include, but are not limitedto, YouTube, Facebook, WhatsApp, Vine, Snapchat, Instagram, Twitter,Flickr, and Reddit.

Implementations of the present system can use appropriate hardware orsoftware; for example, the application 115 and other software on userdevice 110 and/or remote server(s) 120 can execute on a system capableof running an operating system such as the Microsoft Windows® operatingsystems, the Apple OS X® operating systems, the Apple iOS® platform, theGoogle Android™ platform, the Linux® operating system and other variantsof UNIX® operating systems, and the like. The software, can beimplemented on a general purpose computing device in the form of acomputer including a processing unit, a system memory, and a system busthat couples various system components including the system memory tothe processing unit.

Additionally or alternatively, some or all of the functionalitydescribed herein can be performed remotely, in the cloud, or viasoftware-as-a-service. For example, as described above, certainfunctions, such as those provided by the remote server 120, can beperformed on one or more servers or other devices that communicate withuser devices 110, 180. The remote functionality can execute on serverclass computers that have sufficient memory, data storage, andprocessing power and that run a server class operating system (e.g.,Oracle® Solaris®, GNU/Linux®, and the Microsoft® Windows® family ofoperating systems).

The system can include a plurality of software processing modules storedin a memory and executed on a processor. By way of illustration, theprogram modules can be in the form of one or more suitable programminglanguages, which are converted to machine language or object code toallow the processor or processors to execute the instructions. Thesoftware can be in the form of a standalone application, implemented ina suitable programming language or framework.

Method steps of the techniques described herein can be performed by oneor more programmable processors executing one or more computer programsto perform functions by operating on input data and generating output.Method steps can also be performed by, and apparatus can be implementedas, special purpose logic circuitry, e.g., an FPGA (field programmablegate array) or an ASIC (application-specific integrated circuit).Modules can refer to portions of the computer program and/or theprocessor/special circuitry that implements that functionality.

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors.Generally, a processor receives instructions and data from a read-onlymemory or a random access memory or both. The essential elements of acomputer are a processor for executing instructions and one or morememory devices for storing instructions and data. Information carrierssuitable for embodying computer program instructions and data includeall forms of non-volatile memory, including by way of examplesemiconductor memory devices, e.g., EPROM, EEPROM, and flash memorydevices; magnetic disks, e.g., internal hard disks or removable disks;magneto-optical disks; and CD-ROM and DVD-ROM disks. One or morememories can store media assets (e.g., audio, video, graphics, interfaceelements, and/or other media files), configuration files, and/orinstructions that, when executed by a processor, form the modules,engines, and other components described herein and perform thefunctionality associated with the components. The processor and thememory can be supplemented by, or incorporated in special purpose logiccircuitry.

In some implementations, the user device 110 includes a web browser,native application, or both, that facilitates execution of thefunctionality described herein. A web browser allows the device torequest a web page or other program, applet, document, or resource(e.g., from a remote server 120 or other server, such as a web server)with an HTTP request. One example of a web page is a data file thatincludes computer executable or interpretable information, graphics,sound, text, and/or video, that can be displayed, executed, played,processed, streamed, and/or stored and that can contain links, orpointers, to other web pages. In one implementation, a user of the userdevice 110 manually requests a resource from a server. Alternatively,the device 110 automatically makes requests with a browser application.Examples of commercially available web browser software includeMicrosoft® Internet Explorer®, Mozilla® Firefox®, and Apple® Safari®.

In other implementations, the user device 110 includes client software,such as application 115. The client software provides functionality tothe device 110 that provides for the implementation and execution of thefeatures described herein. The client software can be implemented invarious forms, for example, it can be in the form of a nativeapplication, web page, widget, and/or Java, JavaScript, .Net,Silverlight, Flash, and/or other applet or plug-in that is downloaded tothe device and runs in conjunction with a web browser. The clientsoftware and the web browser can be part of a single client-serverinterface; for example, the client software can be implemented as aplug-in to the web browser or to another framework or operating system.Other suitable client software architecture, including but not limitedto widget frameworks and applet technology can also be employed with theclient software.

A communications network 150 can connect user devices 110, 180 with oneor more servers or devices, such as remote server 120. The communicationcan take place over media such as standard telephone lines, LAN or WANlinks (e.g., T1, T3, 56kb, X.25), broadband connections (ISDN, FrameRelay, ATM), wireless links (802.11 (Wi-Fi), Bluetooth, GSM, CDMA,etc.), for example. Other communication media are contemplated. Thenetwork 150 can carry TCP/IP protocol communications, and HTTP/HTTPSrequests made by a web browser, and the connection between the clientdevice and servers can be communicated over such TCP/IP networks. Othercommunication protocols are contemplated.

The system can also be practiced in distributed computing environmentswhere tasks are performed by remote processing devices that are linkedthrough a communications network. In a distributed computingenvironment, program modules can be located in both local and remotecomputer storage media including memory storage devices. Other types ofsystem hardware and software than that described herein can also beused, depending on the capacity of the device and the amount of requireddata processing capability. The system can also be implemented on one ormore virtual machines executing virtualized operating systems such asthose mentioned above, and that operate on one or more computers havinghardware such as that described herein.

It should also be noted that implementations of the systems and methodscan be provided as one or more computer-readable programs embodied on orin one or more articles of manufacture. The program instructions can beencoded on an artificially-generated propagated signal, e.g., amachine-generated electrical, optical, or electromagnetic signal, thatis generated to encode information for transmission to suitable receiverapparatus for execution by a data processing apparatus. A computerstorage medium can be, or be included in, a computer-readable storagedevice, a computer-readable storage substrate, a random or serial accessmemory array or device, or a combination of one or more of them.Moreover, while a computer storage medium is not a propagated signal, acomputer storage medium can be a source or destination of computerprogram instructions encoded in an artificially-generated propagatedsignal. The computer storage medium can also be, or be included in, oneor more separate physical components or media (e.g., multiple CDs,disks, or other storage devices).

FIG. 2 depicts an example user interface (UI) 200 of application 115 forthe playback and editing of media content, such as audio and/or videocaptured by a mobile device. UI 200 includes a visual timelinerepresentation 220 that a user can manipulate to navigate and/or editthe media content. In one implementation, if the media content isstreaming (whether live or prerecorded), the timeline 220 can bedynamically updated or otherwise moving in a synchronized manner withthe media stream. Playback of the media content, or a selected portionthereof, can be shown in display window 210. If the media content doesnot include video or other image-based content, the display window 210can be hidden, blank, or display a visual representation of the content(e.g., a sound wave or captions for audio). UI 200 further includes a“Share” button 240 that enables the user to indicate that the mediacontent can be transmitted from the user's device 110 to one or morethird-parties services 170. The application 115 can be configured withthe user's third-party service account information such that the “Share”button 240 requires a single interaction to upload the current mediacontent to one or more of the services. The user can also be providedwith a dialog that allows the user to select which third-party serviceswill be sent the content. Communication between the application 115 onthe user device 110 and the third-party services 170 can be direct, orin some implementations, the application 115 provides the media contentto remote server 120, which interfaces with the various third-partyservices 170 and relays the content appropriately.

The timeline 220 can include graphical and/or textual elements and canbe, for example, a continuous track that includes visual indicators(e.g., ticks, icons, colors, etc.) of different time positions in thevideo. In one implementation, the timeline 220 includes thumbnails 225of individual video frames representing respective portions of a videofile. The thumbnails 225 can hover, change in size, scroll, and/orotherwise be manipulated on the timeline 220 as the user interacts withthe UI 200. In one example, if the user device 110 includes atouchscreen interface (e.g., a smartphone), the user can manipulate thevideo frame thumbnails 225 on the timeline 220 using his thumb, fingers,a stylus, or other input apparatus. Based on the size of the thumbnails225, screen size, and/or display resolution of the device 110, only aportion of the timeline 220 (and thus a subset of the thumbnails 225)may be visible at any one time. The user then can move along thetimeline 220 by, for example, swiping the thumbnails 225 along an axisof the timeline, moving the thumbnails 225 on or off the visible portionof the timeline 220 on the device screen. In other implementations, theentire timeline 220 and displayed thumbnails 225 are sized and/orpositioned to fit on the device screen.

In one implementation, the timeline 220 includes one or visualindicators delineating a selected portion 230 of the timeline 220. Theselected portion 230 can be a continuous region of the timeline 220 thatis bounded by a first border 234 and a second border 236. Each border234, 236 can correspond to a different respective time position on thetimeline 220. The selected portion can be moved, resized, split, orotherwise manipulated upon the application 115 receiving a user inputaction. The user input action can be received via a component of theuser device 110 (e.g., touchscreen, touchpad, pointing stick, clickwheel, camera, microphone, gyroscope, accelerometer, built-in keyboard,etc.) and/or an separate input peripheral coupled to the user device 110(e.g., mouse, keyboard, trackball, joystick, camera, microphone, gamecontroller, virtual headset, etc.). The user input action can be, forexample, a touchscreen gesture, a mouse gesture, a tap, a click, aclick-and-drag, a tracked eye movement, a button press, an appliedpressure, or other input action suitable for allowing a user tomanipulate the selected portion 230 of the timeline 220. In oneimplementation, the user input action is a tracked free-hand gesturecaptured in conjunction with a user's use of an Oculus or other virtualreality device, where the user is presented with the timeline 220 inthree-dimensional space, and the selected portion 230 where the windowcan be adjusted based on a single free-hand movement along a particularaxis).

In various implementations, the selected portion 230 and/or othercomponents of the timeline 220 are manipulated by the one or more userinput actions. For example, the position of the selected portion 230 onthe timeline 220 can be changed by a user input action substantially onor along a first axis of the timeline 220 (e.g., some deviation in theinput action can be tolerated). The first axis can be, e.g., parallel toor coaxial with the timeline 220, perpendicular to the timeline, angled,or disposed in another position in relation to the timeline 220, intwo-dimensional coordinate space (x, y) or three-dimensional coordinatespace (x, y, z). There can be multiple axes that permit the user tochange the position of the selected portion 230. In someimplementations, the selected portion 230 follows the direction of theuser's finger, stylus, mouse cursor, or other means of input. In oneimplementation, the user moves the selected portion 230 by interactingdirectly with the selected portion 230 whereas, in anotherimplementation, the user can move the selected portion 230 byinteracting with any portion of the timeline 220 and/or the first axis.For instance, the user can cause the selected portion 230 to jump to adifferent area of the timeline 220 by tapping, clicking, pressing, ortaking other action with respect to the different area. In furtherimplementations, the first axis is disposed above the timeline 220,below the timeline 220, or at another position on the UI 200.

As shown in FIG. 3A, in one implementation, a user moves the selectedportion 230 (and simultaneously the corresponding borders 234 and 236)in direction 320 by swiping his thumb (or finger) 310 in the samedirection 320 along horizontal (x) axis 330, which in this example iscoaxial with the timeline 220. Likewise, as shown in FIG. 3B, the usercan move the selected portion 230 (and simultaneously the correspondingborders 234 and 236) in the opposite direction 350 by swiping his thumb310 in that opposite direction 350 along horizontal axis 330. In someimplementations, the user's thumb 310 must be positioned on a border234, 236 or between the borders 234 and 236 to cause the selectedportion 230 to move along the timeline 220. The repositioning of theselected portion 230 can be simultaneous with and track the movement ofthe user's thumb 310 (e.g., the further the user moves his thumb 310along the horizontal axis 330, the further the selected portion 230 ismoved on the timeline). Of note, by allowing the user to reposition theselected portion 230 with a single thumb or finger, the user is able tohold the user device 110 and operate the UI 200 with one hand.

In another example, the selected portion 230 can be resized (i.e., oneor both of the borders 234, 236 is moved closer or further away from theother border(s)) by a user input action substantially along a second,different axis of the timeline 220 (e.g., some deviation in the inputaction can be tolerated). The second axis can be, e.g., parallel to orcoaxial with the timeline 220, perpendicular to the timeline, angled, ordisposed in another position in relation to the timeline 220, intwo-dimensional coordinate space (x, y) or three-dimensional coordinatespace (x, y, z). There can be multiple axes that permit the user toresize the selected portion 230. In some implementations, the selectedportion 230 is resized based on the direction of the user's finger,stylus, mouse cursor, or other means of input. In one implementation,the user resizes the selected portion 230 by interacting directly withthe selected portion 230 whereas, in another implementation, the usercan resize the selected portion 230 by interacting with any portion ofthe timeline 220 and/or the second axis. In further implementations, thesecond axis is disposed above the timeline 220, below the timeline 220,or at another position on the UI 200.

In some implementations, the selected portion 230 can snap to one ormore various preset sizes while being resized. A particular snap-to sizecan be based on which third-party service(s) 170 are available to theuser for sharing or saving media content, which service(s) 170 the userhas specifically configured for use with the application 115, whichservice(s) 170 the user has previously used to share content, and/orwhich services(s) 170 the system predicts the user will use to sharecontent. Custom snap-to sizes can also be manually configured by theuser. In some implementations, upon opening media content in theapplication 115, the selected portion 230 defaults to a preset sizebased on, for example, one or more of the above factors. As one example,if the user most frequently posts video content to Vine, the selectedportion 230 can default to a six-second time span.

As shown in FIG. 4A, in one implementation, a user resizes (reduces thesize of) the selected portion 230 (and simultaneously moves the borders234 and 236 inward toward each other; namely, border 234 in direction430 and border 236 in direction 440) by swiping his thumb 310 indirection 420 along vertical (y) axis 410, which in this example isperpendicular with the timeline 220 and horizontal axis 330. Likewise,as shown in FIG. 4B, the user can resize (increase the size of) theselected portion 230 (and simultaneously move the borders 234 and 236away from each other; namely, border 234 in direction 460 and border 236in direction 470) by swiping his thumb 310 in the opposite direction 450along vertical axis 410. The resizing of the selected portion 230 can besimultaneous with and track the movement of the user's thumb 310 (e.g.,the further the user moves his thumb 310 along the vertical axis 410,the more the selected portion 230 is increased or reduced in size). Ofnote, by allowing the user to resize the selected portion 230 with asingle thumb or finger, the user is able to hold the user device 110 andoperate the UI 200 with one hand.

In another example, a user can increase the size of the selected portion230 (i.e., move the borders 234 and 236 away each other) by swiping histhumb 310 along vertical (y) axis 410 in a direction away from thehorizontal axis 330 (either up or down). The user can then reduce thesize of the selected portion 230 by moving his thumb 310 in the oppositedirection along vertical (y) axis 410, back toward horizontal axis 330.Effectively, this is an absolute value for the size of the selectedportion 230 based on distance up or down from horizontal axis 330.

In some implementations, minimum and/or maximum constraints are set forthe size of the selected portion 230. For example, if the minimum lengthof a video is one second, the minimum length of the selected portion 230can also be set to one second. The maximum selection length can alsovary based on, e.g., where the video will be shared (on Vine, themaximum length is six seconds).

In another example, the selected portion 230 can delineate a portion ofthe media content that should be removed, retained, or ignored(effectively splitting the media content) by a user input actionsubstantially along a third, different axis of the timeline 220 (e.g.,some deviation in the input action can be tolerated). The third axis canbe, e.g., parallel to or coaxial with the timeline 220, perpendicular tothe timeline, angled, or disposed in another position in relation to thetimeline 220, in two-dimensional coordinate space (x, y) orthree-dimensional coordinate space (x, y, z). There can be multiple axesthat permit the user to split the media content via the selected portion230. In one implementation, the user splits the media content byinteracting directly with the selected portion 230 whereas, in anotherimplementation, the user can split the media content by interacting withany portion of the timeline 220 and/or the third axis. In furtherimplementations, the third axis is disposed above the timeline 220,below the timeline 220, or at another position on the UI 200.

As shown in FIG. 5, in one implementation, a user splits the video bytapping, pressing, clicking (or other user input action) along a z-axisperpendicular to both horizontal x-axis 330 and vertical y-axis 410. Forinstance, by tapping on the selected portion 230, the video is splitinto two portions 510 a and 510 b on either side of the selected portion230, and the video segment delineated by the selected portion 230 isdiscarded from the video or otherwise ignored for video playback,saving, and/or sharing. In one implementation, by tapping on theselected portion 230, the split portions 510 a and 510 b become selectedportions, and the selected portion 230 is deselected. In anotherexample, by tapping on the selected portion 230, the two portions 510 aand 510 b are discarded from the video or otherwise ignored, leavingjust the selected portion 230. In some implementations, the user cancreate multiple selected portions by interacting with multiple portionsof the timeline 220 (e.g., tapping on different areas of the timeline220). The user can also interact again with a particular selectedportion 230 to invert or cancel the selection.

FIGS. 13A-13C illustrate an example technique for selecting a startingpoint and ending point of a selected portion of a video using a videoediting interface. As shown in FIG. 13A, in one implementation, a usercan adjust the size of a selected portion 1330 of a video along a videotimeline 1302 by manipulating the left border 1334 of the portion 1330independently of the right border 1336. For example, the user can swipehis thumb (or finger) 1310 along horizontal axis 1380 of the videotimeline 1302 to increase (by swiping to the left) or decrease (byswiping to the right) the size of the selected portion 1330. Display1390 shows one or more images derived from one or more frames of thevideo. For example, display 1390 can show a static image or a video clip(e.g., the portion of the video within the boundaries of the selectedportion 1330). The image(s) can correspond to the first frame, lastframe, or an intermediate frame(s) from a portion of the video that isdefined by the selected portion 1330.

FIG. 13B further illustrates the resizing of the selected portion 1330by manipulating the right border 1336 of the selected portion 1330independently of the left border 1334. As described above with respectto the manipulation of the left border 1334, a user can swipe his thumb(or finger) 1310 along horizontal axis 1380 of the video timeline 1302to increase (by swiping to the right) or decrease (by swiping to theleft) the size of the selected portion 1330. Again, display 1390 canshow one or more images from one or more frames (e.g., first, last, orintermediate frame(s)) from a portion of the video defined by theselected portion 1330.

Referring to FIG. 13C, the user can interact with the video editinginterface to create a split-display based on the selected portion 1330.In one example, the interaction is a motion (e.g., swipe) that the usermakes with his thumb (or finger) 1310 along a different axis, such asvertical axis 1385. In other instances, the interaction is a tap on theselected portion 1330, a tap on a separate interface button, or otheruser action. In the depicted example, the user swipes his thumb 1310 ina downward direction along axis 1385, which causes display 1390 fromFIGS. 13A and 13B to split into two (or, in some instances, more thantwo) individual displays 1390 a, 1390 b. In one implementation, display1390 a corresponds to a portion of the video marked by or abuttingborder 1334 with respect to the timeline 1302, and display 1390 bcorresponds to a portion of the video marked by or abutting border 1336with respect to the timeline 1302. For example, display 1390 a can showan image or short video clip corresponding to the start of the videowithin the selected portion 1330. Similarly, display 1390 b can show animage or short video clip corresponding to the end of the video withinthe selected portion 1330. Other displays can show, e.g., intermediateportions of the video within the selected portion 1330.

Referring back to FIG. 2, the timeline 220 can also include visualindicators of portions of interest 250 in the media content, as furtherdescribed below. The visual indicators 250 can include graphics and/ortext, and can include various shapes, colors, icons, animations, and soon, that indicate different types of portions of interest (e.g., basedon sound, motion, facial recognition, and/or other signals). Aparticular visual indicator 250 can designate a starting point of aportion of interest, an ending point of a portion of interest, and/or acontinuous portion of the content that is of interest. In someimplementations, the visual indicators facilitate a user's navigation ofthe timeline 220 by notifying (actively and/or passively) the user whichsegments of the video may be interesting to the user. In someimplementations, the user can interact with a particular visualindicator 250 to jump to that position in the timeline 220 and/or toview more information about why the video segment is considered to beinteresting. In another implementation, the timeline 220 furtherincludes visual indicators that identify points and/or portions ofinterest manually tagged by a user, whether during or after recording ofthe content.

FIG. 6 depicts one implementation of a method 600 for manipulating theselected portion 230 in the timeline 220. In STEP 602, a visualrepresentation of a timeline of media content is provided in UI 200 ofapplication 115. The timeline 220 comprises a plurality of differenttime positions in the media content. A selected portion 230 of thetimeline 220 is indicated as a continuous region of the timeline 220bounded by a first border 234 and a second border 236 (STEP 606). Eachborder 234, 236 of the region corresponds to a different respective timeposition on the timeline 220. If the UI 200 receives a first user inputaction along a first axis of the timeline 220 (e.g., a touchscreen swipealong a horizontal timeline axis), the position of the selected portion230 is changed based on the first user input action (STEP 610). If theUI 200 receives a second user input action along a second axis of thetimeline 220 (e.g., a touchscreen swipe along a vertical timeline axis),the selected portion 230 is resized based on the second user inputaction (STEP 614). Resizing the selected portion 230 can include movingboth of the respective time positions of the borders to be closer toeach other or farther from each other. If the UI 200 receives a thirduser input action along a third axis of the timeline 220 (e.g., a tapalong a z-axis of the timeline 220), the media content is split into aplurality of portions based on the position of the selected portion 230on the timeline 220 when the third user input action is received (STEP618).

In one implementation, the application 115 on the user device 110automatically identifies one or more portions of media content that maybe of interest to the device user. The automatic identification can alsobe performed remotely, wholly or in part, by, e.g., remote server 120.Portions of interest in media content can be automatically identifiedbased on one or more signals associated with the content. As describedabove, a signal can represent an identifiable type of content withindigital media (e.g., motion, sound, recognized faces (known or unknownpeople), recognized objects, recognized activities, recognized scenes,and the like), as well as an attribute associated with the media (e.g.,popularity, context, location, date, time, weather, news reports, and soon).

A signal can vary in intensity over the length of an audio file, videofile, or other media content. “Signal intensity,” as used herein, refersto the presence of a particular content type in media content and, insome cases, the extent to which the content type exists in a particularportion of the media content. In the case of explicit signals andcertain attributes associated with media content, signal intensity canbe binary (e.g., exists or does not exist). For content types such asmotion, sound, facial recognition, and so on, as well as certain sensorreadings the intensity can be a function of the concentration of thecontent type in a particular portion of the media content, and can, forexample, vary over a fixed range or dynamic range (e.g., definedrelative to the intensities over the signal domain and/or relative toother signals), or fall into defined levels or tiers (e.g., zerointensity, low intensity, medium intensity, high intensity). In the caseof motion content, portions of a media file that are determined to havehigher instances of movement (or a particular type of movementindicative of a particular activity such as, for example, skiing orbicycle riding) will have correspondingly higher motion intensitylevels. As another example, the intensity of audio content can bedetermined based on the loudness of audio in a particular portion ofmedia content. For general facial recognition, intensity can be based onthe number of identified faces in a particular portion of a video (e.g.,more faces equals higher intensity). For known facial recognition,intensity can be based on the number of identified faces that are knownto a user in a particular portion of a video (e.g., friends, family,social networking connections, etc.). In the case of external sensorreadings associated with media content (e.g., an accelerometer in asmartphone), intensity can be based on the amount strength of thereadings detected by the sensor (e.g., for the accelerometer, strongermovement readings equals higher intensity).

Certain signals are considered “implicit,” as they can be automaticallyidentified based on the media content or an associated attribute.Implicit signals can include motion, sound, facial/object recognition,popularity, context, and so on. Other signals are “explicit,” in thatthey can include manually defined elements. For example, a user canmanually tag a portion of a video prior to, during, or after recording(e.g., via the UI 200) to indicate that the portion should be consideredinteresting. In some implementations, while recording audio and/orvideo, the user manipulates a control (e.g., a button) on a recordingdevice, on the user device 110, or on another external device (e.g., awirelessly connected ring, wristwatch, pendant, or other wearabledevice) in communication (e.g., via Bluetooth, Wi-Fi, etc.) with therecording and/or user device 110, to indicate that an interestingportion of the audio/video is beginning. The user can then manipulatethe same or a different control a second time to indicate that theinteresting portion has ended. The period between the start and end timeof the interesting portion can then be considered as having a“user-tagged” signal.

FIG. 7 depicts one implementation of a method 700 for identifying aportion of interest of media content. In STEP 702, a video is received(e.g., at user device 110, remote server 120, or other processingdevice). The video can include one or more signals, such as thosesignals described above. For at least one of the signals, anintermediate portion of interest in the video is identified based on therespective signal (STEP 706). A particular intermediate portion ofinterest of the video can be determined based on the intensity of asignal associated with that portion. For example, if a certain portionof the video has an incidence of loud noise relative to the rest of thevideo, that certain portion can be considered an intermediate portion ofinterest based on the intensity of the audio signal. In someimplementations, intermediate portions of interest can be identifiedbased on the intensity of multiple signals within the respectiveportions.

In STEP 710, a weighting is associated with at least one of the signals.For example, only motion and facial recognition might be consideredimportant for a particular video, so only those signals are given anon-zero weighting. In another instance, explicit signals are notincluded in the weighting. The weighting can be personal to a particularuser, general based on other users, or a combination of both. Morespecifically, the weighting can be determined based on historicalattributes associated with a media content editor (e.g., the user of theapplication 115, another individual that is recognized for creatingpopular media content, or other person or entity) and/or historicalattributes associated with a group of users (e.g., users who havecreated media content with other application instances, users who haveexpressed interest in media content created by the application user,and/or other group of users whose actions can contribute to adetermination of the importance of a particular signal relative to othersignals).

For example, if a user creates skydiving videos and frequently indicatesthat portions containing a high signal intensity for sound are the mostinteresting to him (e.g., by sharing videos that often contain suchhigh-signal-intensity portions), the system can allocate a higherweighting to the sound signal relative to other signals (e.g., sound isweighted at 60%, while the remainder of the signals make up theremaining 40% weighting). This weighting can be applied to other videosedited by the user and, in some instances, can be combined withweightings based on the preferences of user groups, as indicated above.If combined, individual and group weightings can be weighted equally(e.g., as an initial default weighting), or in other instances, one typecan have a greater weight than the other. For example, if there islittle or no training data available for a particular individual, theweightings based on user group preferences can be weighted more heavily.In some implementations, signal weighting is also dependent on thecontext or other attribute(s) associated with particular media content.For instance, if the user prefers high intensity sound signals in hisskydiving videos, but prefers high intensity motion signals in hissnowboarding videos, the system can weight the signals differently basedon whether the user is editing one type of video or the other.

Historical attributes of a content editor and/or group of users caninclude the following: a propensity of the editor/group to favor mediacontent having a particular signal (e.g., sound is preferred overmotion, recognized faces, etc.), a propensity of the editor/group tofavor media content lacking a particular signal (e.g., a video withoutrecognized faces is preferred), a propensity of the editor/group tofavor media content having a particular signal with a particular signalintensity (e.g., a high intensity of motion is preferred in anaction-oriented video), a propensity of the editor/group to disfavormedia content having a particular signal (e.g., portions of a videowhich an ex-girlfriend's face appears are disfavored), a propensity ofthe editor/group to disfavor media content lacking a particular signal(e.g., video without user-tagged portions is disfavored), and apropensity of the editor/group to disfavor media content having aparticular signal with a particular signal intensity (e.g., portions ofa concert recording with a low intensity sound signal are disfavored).

The system can refine the weightings it applies to particular signals asdata is collected over time relating to user and group preferences ofthe signals and signal intensities. In some implementations, theweighting process is facilitated or automatically performed usingmachine learning, pattern recognition, data mining, statisticalcorrelation, support vector machines, Gaussian mixture models, and/orother suitable known techniques. In one example, signal attributesassociated with particular weightings can be viewed as vectors in amulti-dimensional space, and the similarity between signal attributes ofunweighted signals and signals with particular weightings (e.g.,weightings that reflect preferred or otherwise popular media portions bythe user and/or other users) can be determined based on a cosine anglebetween vectors or other suitable method. If the similarity meets athreshold, an unweighted signal can be assigned the weighting of thesimilar signal vector.

As another example, a classifier (e.g., a suitable algorithm thatcategorizes new observations) can be trained over time using varioushistorical data, such as the historical attributes referred to above. Aclassifier can be personal to an individual user, and use training databased only on that user's signal preferences and other data. Otherclassifiers can be trained based on data associated with the preferencesof a group of users. For instance, each time an editor shares mediacontent or otherwise indicates that a portion of the media content is ofinterest, the signal information associated with the (portion of the)media content (e.g. signal preference, signal intensity preference,etc.) can be stored on the user device 110 and/or transferred to remoteserver 120 for use as training data to improve future weightings for theeditor and/or groups of users. The input to such a classifier (e.g.,upon creating new media content or opening a media file) can includesignal data, intensity data, media content attribute data, and otherinformation associated with the media content. The classifier can thendetermine, based on the input and the training data, an appropriateweighting of signals for the media content.

Still referring to FIG. 7, in STEP 714, one or more overall portions ofinterest of the media content are identified based on the intermediateportion(s) of interest and the signal weighting(s). An overall portionof interest can be identified by combining the signals according totheir respective weightings, and selecting a portion of the combinedsignals (corresponding to a portion of the media content) that meets athreshold signal intensity. Alternatively or in addition, the top orbottom N combined signal intensity points (e.g., top/bottom one,top/bottom three, top/bottom five, top/bottom ten, etc.) can be used todetermine the overall points of interest. For example, the top threepoints (in non-overlapping regions) can be identified, and the segmentsof the media content that surround each point (e.g., +/− N seconds oneither side) can be considered overall portions of interest. Toillustrate, when a user creates or opens a video via the application115, the application 115 can suggest one or more portions of the videothat might be of interest to the user (e.g., by a suitable form ofvisual indication), based on signals in the video and weightingsdetermined based on the user, another user, and/or groups of users. Inone implementation, the application 115 presents different signalweightings (and, in some cases, the corresponding portions of interest)to the user (e.g., a weighting based on the user's preferences, aweighting based on an expert's preferences, and/or a weighting based ona group of users' preferences) and allows the user to select whichweighting(s) and/or portions of interest the user prefers.

FIG. 8 depicts one implementation of a video playback interface 800(which can be included in application 115) through which users can playmedia content (e.g., audio, video, etc.) created and/or edited with theapplication 115 or by other means. Interface 800 includes a visualrepresentation of a timeline 810, which defines different time positionsin the media content and can be manipulated by a user to navigatethrough media content shown in display window 820.

Interface 800 also includes visual representations of signals 830associated with the media content. The signal representations 830 can bedisposed on, above, below, or otherwise proximate to the timeline, andcan depict the intensity of one or more of the media content signals asa waveform over the length of the media content. If multiple signalswaveforms are displayed, each can be a different color and/or includesome other differentiating identifier (e.g., identifying text displayedupon selection or hover). In some implementations, the user canconfigure the interface 800 to display all signals associated with themedia content, no signals, or a subset thereof (e.g., the user cantoggle the display of individual signals, implicit signals, explicitsignals, and other signal categories). The user can also configure theinterface 800 to display the signals separately, display the sum of thesignals, and/or display various weighted sums of signals (e.g., auser-based weighting, a group-based weighting, etc.).

Users can express their interest in particular points or portions ofmedia content by “liking” that point or portion while the content isplaying in display window 820. In other implementations, users cannavigate to a particular portion of the media content using the timelineand “like” the point or portion whether the media is playing, paused, orstopped. A user can actively “like” the media content using, e.g., abutton on the interface 800, and/or by other suitable user input actions(e.g., touchscreen gesture, mouse gesture, tracked free-hand gesture,etc.) applied or directed to the display window 820, timeline 810, orother portion of the interface 800. The “like” can be visuallyrepresented on the timeline 810 to the user and/or other users viewingthe same media content using a suitable graphical and/or textualindicator (e.g., colored shape, icon, pop-up text, combinations thereof,and so on). For example, in the depicted implementation, visual likeindicators 840 are disposed at each location on the timeline where auser “liked” the corresponding media content.

Similarly, users can comment on a point or portion of the media contentusing, e.g., a button on the interface 800 that opens a text box, and/orother suitable user interface control. The user comments can be visuallyrepresented on the timeline 810 to the user and/or other users viewingthe same media content. For example, in the depicted implementation,visual comment indicators 850 are disposed at each location on thetimeline where a user commented on the corresponding media content. Thevisual comment indicators 850 can include a thumbnail of the user'savatar or profile image (e.g., corresponding to a social networking orother account), and/or can include other suitable graphical and/ortextual indicators.

Users can also indicate their interest in the media content or portionsthereof in various other manners. For example, a user can select aportion of the media content to share with others through a socialnetwork, such as Facebook. A user can also highlight or provide someother visual identification of a portion of interest through theinterface 800, and the identification can then be made available toother users, individually or in combination with other indications ofinterest. In one implementation, a heat map can be used to identifyvarying levels of interest in portions of the media content.

The presence of likes, comments, recommendations, and/or otherindications of social interest over the length of media content canconstitute a social signal, which has an intensity that varies inrelation to popularity. For example, portions of media content that havea higher concentration of likes, comments, and other indicators ofinterest relative to other portions of the media content will have ahigher social signal intensity. In some implementations, the socialsignal can be used in further refining signal weightings for thecorresponding media content or other media content. In one example, if avideo is published in which the motion signal was heavily weightedcompared to other signals, but others users, upon viewing the video,prefer portions of the video in which many faces appear (i.e., thesocial signal has a higher intensity at these face portions), then thesocial signal may cause future weightings of the media content or othermedia content to be biased more toward the facial recognition signal.More specifically, social signals can become part of the training datathat influences the determination of signal weights.

Ultimately, the interface 800 provides for an overall “signature” thatreflects the experience associated with a video (or other form of mediacontent) and that can be quickly apprehended by a user accessing theparticular video. The signature can include a visual representation ofthe implicit and/or explicit signals associated with the video (e.g.,separate signals, a weighted combination), such as sound, motion, faces,and the social signal. Thus, from implicit signals, social signals, userlabels, and other information associated with the video, a user caneasily determine what types of content the video includes as well aswhat portions are most popular to other users.

In some implementations, multiple videos can be represented that arecotemporaneous on the timeline, for instance, as spatially stackedthumbnails contiguous in the y-axis and/or with edges stacked like astack of papers to provide either a full representation of multiplepoints of view of the same event spatially displayed, or the foremostvideo stream with other points of view selectable by the user. Thesignature can represent the foremost video or the aggregate signature ofall or a subset of the cotemporaneous videos. The set of potentialcotemporaneous signals can include large sets of video points of view torepresent a single event, such as a baseball game, or a class event,such as all snowboarding on planet Earth at this moment. The samestacking in a different axis (such as the x-axis) can be used torepresent jumps in time from the same point of view. For instance, twocontiguous pieces of video, two years apart from the same video sourcesuch as a home video camera, can be displayed one after the other. Thestacking on multiple axes, either spatially contiguous or occluding, canbe combined to represent a large number of points of view across a largeamount of time, such as all wearable video cameras ever used at afootball stadium. The resulting signature can represent either part orall of the videos.

FIG. 9 depicts an example method 900 for providing an experience map ofmedia content. In STEP 902, a visual representation of a timeline of avideo is provided (e.g., by application 115 via interface 800). Thetimeline includes a plurality of different time positions in the video.Visual representations of one or more signals along the timeline of thevideo are also provided, as described above (STEP 906). Each signalrepresentation includes an intensity of an identifiable type of contentover the time positions. Identifiable types of content can include, forexample, motion, sound, presence of faces, recognized faces, recognizedobjects, recognized activities, and recognized scenes.

From each of a number of users, an indication of interest in a portionof the video is received via interface 800 (STEP 910), and eachindication of interest is visually represented on the timeline (STEP914). The indications of interest can be, for example, comments, likes,share, highlights, and the like, and can be graphically and/or textuallyrepresented by the interface 800 as described above.

In some implementations, a social signal is determined based on theindications of interest. The intensity of the social signal over thelength of the video can be based on the concentration of the indicationsof interest over the video length (e.g., more likes for a particularportion relative to other video portions equals a higher intensitysocial signal for that particular portion). As described above, thesocial signal can be incorporated into the training data of the systemwhere it can influence determined signal weightings for various mediacontent, in conjunction with social signals for other media content andother historical attributes associated with users and groups of users.

In one implementation, signals and/or other attributes can be used torecommend appropriate filters for application (automatically or by auser) to media content. FIG. 10 illustrates an example method 1000 forproviding filter recommendations. In STEP 1002, media content isreceived on the user device 110. For example, a digital photograph orvideo can be captured using a camera on the device 110, media contentcan be downloaded onto the device 110, and so on. The application 115 onthe device 110 can provide an interface that allows the user to edit themedia content by applying one or more digital filters to the content. A“filter” refers to a modifier of an image or other media content.Filters can provide color enhancement and conversion, changes tobrightness and contrast, polarization, diffusion, special effects, andother changes to media content. The application 115 can provide filterfunctionality based on filters stored on the device 110, custom filterscreated by the user or other users, and/or filters defined by orretrieved from other sources, such as Instagram. The various availablefilters can be visually provided to the user for selection via a userinterface such as that shown in FIGS. 11 and 12 (STEP 1006).

In STEP 1010, the application 115 determines a subset of the availablefilters (e.g., 4 to 8 filters) to recommend to the user based on one ormore attributes. The attributes can relate to the media content (theentire content or a portion thereof), the user, and/or the mobiledevice. For example, attributes associated with the media content caninclude geolocation, location, point of interests, and a signal in themedia content (e.g., motion, sound, recognized faces (known or unknownpeople), recognized objects, recognized activities, and recognizedscenes). Attributes can be derived from metadata associated with themedia content or by processing the content (e.g., audio, video, imagery)itself. For instance, to determine the geolocation of the contentcapturing device (e.g., mobile phone), or the location or nearby pointsof interest with respect to where the media content was captured, geotaginformation in the content can be examined. Further, a database or otherdata structure tracking points of interest (e.g., parks, museums,buildings, geographic features, etc.) can be correlated with thelocation information to identify which points of interest are located ator nearby the location associated with the media content. Points ofinterest can be matched directly or semantics in the name of the pointof interest (e.g. the word “beach”) can be used. In addition, images andvideo can be processed to recognize particular activities (e.g.,snowboarding, skydiving, driving, etc.), particular objects or scenes(e.g., the Empire State Building, the Boston skyline, a snow-coveredmountain), weather, lighting conditions, and so on, and audio can beprocessed to further inform the recognition process (e.g., the sounds ofa beach, a crowd, music, etc., can be processed to help identify alocation or event).

Attributes used to inform the recommendation of the filters can alsoinclude attributes associated with the device that captured the mediacontent (e.g., the user device 110), such as device model, camera type,image sensor resolution, use of flash, white balance, exposure, othercamera settings, and so on. This information can be derived, forexample, from metadata included in the media content or by examining thecapturing device settings and properties during or after capturing mediacontent. Other attributes can include information associated with thedevice user, the user who captured the media content, or a group ofusers. For example, the previous behavior and/or preferences of aparticular user or a group of users with respect to filter selection caninfluence what filters are recommended by the application 115. If, forinstance, the user, the user's followers, all users, or anotheridentified group select a sepia tone filter for portraits 50% of thetime, then that filter can be highly recommended the next time the usercaptures a portrait-type photograph.

The various machine learning techniques described herein can applied tofilter selection. For instance, a classifier can be trained over timeusing as input attributes (such as those described above) for particularmedia content in combination with the filter or filters that wereselected and applied to the media content. Thus, as an example, theclassifier might learn over time that filters that reduce glare andenhance blue tones are often applied to media content that include videoor images of activities taking place on snow, such as skiing orsnowboarding. As another example, the classifier might learn over timethat filters that add warmth and reduce contrast are frequently selectedfor media content captured near Venice Beach. Accordingly, to determinea recommendation of filters that are likely to be selected by a userfrom all available filters, one or more attributes can be input to theclassifier and the recommendation can be obtained as output. Therecommendation can include one or more filters ranked in order oflikelihood of selection by the user.

In STEP 1014, the application 115 visually identifies the recommendedfilters. For example, the application can provide a display of the mediacontent or a portion thereof with a recommended filter applied,accompanied by the name of the filter. In the case of a video, thedisplay can include multiple copies of an image frame from the videoarranged on the device screen, each with a different recommended filterapplied as a preview (see FIG. 11). The user can then select the desiredfilter to apply by interacting with the display (e.g., via touch). Theuser can also manipulate the media content to preview applied filters toother portions of the media content, and then apply one or more filters(recommended or not recommended) to all or a portion of the mediacontent. Accordingly, in STEP 1018, the filter selection is received bythe application 115, which applies the selected filter(s) to thedesignated media content or portion of the media content (STEP 1022).

The terms and expressions employed herein are used as terms andexpressions of description and not of limitation, and there is nointention, in the use of such terms and expressions, of excluding anyequivalents of the features shown and described or portions thereof. Inaddition, having described certain implementations in the presentdisclosure, it will be apparent to those of ordinary skill in the artthat other implementations incorporating the concepts disclosed hereincan be used without departing from the spirit and scope of theinvention. The features and functions of the various implementations canbe arranged in various combinations and permutations, and all areconsidered to be within the scope of the disclosed invention.Accordingly, the described implementations are to be considered in allrespects as illustrative and not restrictive. The configurations,materials, and dimensions described herein are also intended asillustrative and in no way limiting. Similarly, although physicalexplanations have been provided for explanatory purposes, there is nointent to be bound by any particular theory or mechanism, or to limitthe claims in accordance therewith.

What is claimed is:
 1. A computer-implemented method comprising: providing a visual representation of a timeline of a video wherein the timeline comprises a plurality of different time positions in the video; providing a visual representation of one or more signals along the timeline of the video, wherein each signal representation comprises a respective intensity of the signal over the time positions; obtaining, for each of a plurality of users, an indication of interest of the user in a respective portion of the video; and providing a visual representation on the timeline of each indication of interest.
 2. The method of claim 1, wherein the visual representation of the signals comprises a visual representation of a weighted sum of signals.
 3. The method of claim 1, wherein a particular indication of interest comprises a comment, a like, a share, or a highlight.
 4. The method of claim 1, further comprising determining a social signal based on the indications of interest, wherein an intensity of the social signal over a length of the video is based on a concentration of the indications of interest over the length of the video.
 5. The method of claim 4, further comprising: receiving a second video comprising a plurality of signals, each signal representing an identifiable type of content over a length of the video; for at least one of the signals, associating a weighting with the signal, wherein the weighting is determined based at least in part on the social signal; and identifying one or more portions of interest in the second video based on the at least one signal weighting.
 6. The method of claim 5, wherein the weighting is determined based at least in part on the intensity of the social signal.
 7. The method of claim 5, wherein the weighting is determined based at least in part on indications of interest from a plurality of videos.
 8. The method of claim 1, wherein a particular signal represents one of motion, sound, presence of faces, recognized faces, recognized objects, recognized activities, recognized scenes, sensor readings, context, or user-specified.
 9. A system comprising: one or more computers programmed to perform operations comprising: providing a visual representation of a timeline of a video wherein the timeline comprises a plurality of different time positions in the video; providing a visual representation of one or more signals along the timeline of the video, wherein each signal representation comprises a respective intensity of the signal over the different time positions; receiving, from each of a plurality of users, an indication of interest in a portion of the video; and providing a visual representation on the timeline of each indication of interest.
 10. The system of claim 9, wherein the visual representation of the signals comprises a visual representation of a weighted sum of signals.
 11. The system of claim 9, wherein a particular indication of interest comprises a comment, a like, a share, or a highlight.
 12. The system of claim 9, wherein the operations further comprise determining a social signal based on the indications of interest, wherein an intensity of the social signal over a length of the video is based on a concentration of the indications of interest over the length of the video.
 13. The system of claim 12, wherein the operations further comprise: receiving a second video comprising a plurality of signals, each signal representing an identifiable type of content over a length of the video; for at least one of the signals, associating a weighting with the signal, wherein the weighting is determined based at least in part on the social signal; and identifying one or more portions of interest in the second video based on the at least one signal weighting.
 14. The system of claim 13, wherein the weighting is determined based at least in part on the intensity of the social signal.
 15. The system of claim 13, wherein the weighting is determined based at least in part on indications of interest from a plurality of videos.
 16. The system of claim 9, wherein a particular signal represents one of motion, sound, presence of faces, recognized faces, recognized objects, recognized activities, recognized scenes, sensor readings, context, or user-specified. 